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Email graphs have been used to illustrate general properties of social networks of communication 
and collaboration. However, increasingly, the majority of email traffic reflects opportunistic, rather 
than symbiotic social relations. Here we use e-mail data drawn from a large university to construct 
directed graphs of email exchange that quantify the differences between social and antisocial behav- 
iors in networks of communication. We show that while structural characteristics typical of other 
social networks are shared to a large extent by the legitimate component they are not characteristic 
of antisocial traffic. Interestingly, opportunistic patterns of behavior do create nontrivial graphs 
with certain general characteristics that we identify. To complement the graph analysis, which 
suffers from incomplete knowledge of users external to the domain, we study temporal patterns of 
communication to show that the dynamical properties of email traffic can, in principle, distinguish 
different types of social relations. 
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I. INTRODUCTION 

The fast pace of recent progress in the quantitative 
understanding of complex networks that mediate social 
interactions has been largely due to new ways of harvest- 
ing data, mainly by electronic means. For this reason 
graphs of email communication, where nodes represent 
email users and links denote messages exchanged between 
them, have become important example social networks. 
The statistical mechanics of these networks makes possi- 
ble a quantification of aspects of human social behavior 
and their comparison to the structure of interactions in 
other complex systems. 

A recent study Q has provided evidence for structural 
properties that are characteristic of social graphs, but 
not of other complex networks. These are a nontrivial 
clustering coefficient (network transitivity) and the pres- 
ence of positive degree correlations (assortative mixing by 
degree) between adjacent nodes. Moreover, it has been 
suggested that social networks can be largely understood 
in terms of the organization of nodes into communities 
@, a feature that can explain, to some extent, 

the observed values for the clustering coefficient and de- 
gree correlations. This observation has indeed led to the 
interesting suggestion that email networks can be used 
to infer informal communities of practice within organi- 
zations fH], as well as their hierarchical structure 0, 13, 0| , 
features that can in principle be useful for the efficient 
management of human collective behavior. In fact, the 
nature of such hierarchies can be quantified 0, Q, and 
may be self-similar [3]. 

Beyond these characteristics that are, at least at the 
qualitative level, general to social networks there are fea- 
tures of email graphs that are specific. The most im- 
portant property of email is the low cost [28] involved in 



delivering a message to a large group of recipients. This 
tends to make communication between any two nodes 
more indiscriminate, as email senders may easily send 
copies of a message to multiple parties that play no ac- 
tive role in the relationship between sender and recipi- 
ent. As such, we may expect that networks of email may 
contain nodes with very high degree, and that degree dis- 
tributions exhibit less severe or no practical constraints 
to their high degree tails. The result, as we show below, 
is that networks of email show no upper cutoff in their 
degree distributions, which are scale free with a small ex- 
ponent a, and degree correlations that may be atypical 
of other social networks. 

The ease with which messages can be distributed to 
many recipients is also at the root of most opportunistic 
behavior involving email. In fact, there has been grow- 
ing interest in uncovering evidence of antisocial behavior 
in online networks. Recent work addresses topics such 
as uninhibited remarks, hostile flaming, non-conforming 
behavior, group polarization, and spurious traffic (9l.[lo|. 
Email as a means of potential mass distribution is par- 
ticularly associated with the dissemination of computer 
viruses as well as spam traffic [ll|, that flood the Internet 
with unwanted messages usually containing commercial 
propositions or, more recently, a variety of other scams. 
This behavior, which we call generically antisocial, dis- 
plays different characteristics from other types of social 
relations for which social networks have been constructed 
and analyzed. 

In all previous characterizations of email communica- 
tions as networks, the problem that these networks also 
mediate antisocial relations has not been addressed. In 
order to attempt to eliminate such behaviors, as well as 
to deal with incomplete network reconstruction, authors 
have used several strategies such as restricting the anal- 
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ysis of email traffic to within the organization's domain 
@, 0, S 0, EH , taking into account only links that dis- 
play communication in both directions 0, H, 0]> elimi- 
natingnodes associated with very high message volumes 
0, [H,l3]; and setting minimal message thresholds for a 
link to exist 

Here we provide a more complete study of email net- 
works by lifting most of these restrictions. Then email 
networks become directed, and the number of users and 
links in our dataset is dominated by spam traffic. What 
is conceptually interesting about spam email is that it 
nevertheless displays quantitative graph theoretical and 
dynamical characteristics that are nontrivial. Moreover, 
these characteristics reflect a certain type of antisocial be- 
havior that can be quantitatively characterized and con- 
trasted to the general properties of other social networks. 

The remaining of this manuscript is organized as fol- 
lows. In section HI] we give details about our data and 
the several networks of social, and antisocial behavior 
constructed. We then proceed to analyze them via stan- 
dard network measures for which we expect antisocial 
behavior to differ from social. In section IIIII we give an 
additional characterization of the temporal structure of 
time series of email and show that social and anti-social 
traffics differ in several characteristic ways. Finally we 
present our conclusions. 
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FIG. 1: Average degree power-law distribution for social (top) 
and antisocial (bottom) networks. 



II. NETWORK INFERENCE AND 
STRUCTURAL ANALYSIS 

To construct networks of email communication we con- 
sider the email traffic from a department of a large univer- 
sity. Email messages arriving at the departmental server 
are classified either as spam or legitimate by SpamAssas- 
sin, a standard and widely used filtering software 
We construct four graphs representing different email 
networks. A social network is built from the legitimate 
(as classified by SpamAssassin) messages exchanged be- 
tween all users, including those external to the depart- 
ment that send/receive e-mails to/from internal users. 
Similarly, an antisocial networkis built from the messages 
classified as spam, exchanged between all users. An in- 
ternal social network is built by considering internal users 
exclusively involved in legitimate internal email commu- 
nication. Finally, the internal spam traffic [HI is used 
to build an internal antisocial network. In general these 
networks are directed. We note that messages exchanged 
through legitimate mailing lists, which also involve bulk 
email traffic, may exhibit antisocial characteristics. As 
in aiming at minimizing the impact of such commu- 
nication patterns in our analysis, we remove users who 
exchange emails with fifty or more other users from our 
internal social network. 

Our four networks are built from a thirty-day log in- 
cluding 562664 messages, of which 270491 are spam. The 
set consists of 19504 internal and 259069 external users. 
Of these, 164998 external users are senders of spam, while 



that number is only 721 for those internal to the domain, 
most of them under fabricated identifiers. Also note that 
the number of users in our log is orders of magnitude 
larger that those included in several previously analyzed 
datasets flR fl6|. 

Ebel, Mielsh and Bornholdt analyzed a similarly 
constructed email network, although without drawing the 
distinction between spam and legitimate traffic. They 
characterized the degree k distributions for the entire 
graph as a power law P(k) oc l/fc Q , with exponent 
a = 1.81. For the network composed exclusively of in- 
ternal users they found a smaller exponent a — 1.32. 
Similarly we find power law degree distributions for the 
undirected versions of our four networks, with expo- 
nents a = 1.82 (R 2 = 0.942) for the full social network, 
a = 2.03 (R 2 = 0.925) for the entire antisocial network, 
see Figure [U and a = 1.22 (R 2 = 0.958) and a = 1.79 
(R 2 = 0.831) for the internal social and antisocial net- 
works, respectively. It is remarkable that our results are 
broadly consistent with those of [HI , for entirely different 
data. We find a tendency for the exponent to be larger 
for antisocial behavior, which suggests that the true so- 
cial exponent may be over estimated if the two traffics 
are not separated. The lower values of R 2 for the antiso- 
cial networks suggest that the power law model is more 
adequate to represent social networks than their antiso- 
cial counterparts. Despite these differences, the degree 
distribution is a weak discriminator between social and 
antisocial behavior and is clearly affected by incomplete 



3 




Clustering Coefficient 

FIG. 2: Distribution of the clustering coefficient for social, 
antisocial networks and their corresponding randomized net- 
works with preserved degree sequence for the internal net- 
works (top) and complete networks (bottom) built from mes- 
sages exchanged between all users, internal and external. 



knowledge of parts of the network, which is a considera- 
tion whenever external users are included. Such lack of 
knowledge results in the incorrect shift of external users 
to lower degree, and consequently leads to larger esti- 
mates of the exponent a. Thus both the failure to ex- 
clude spam traffic and the incomplete knowledge of links 
between external users contribute to overestimations of 
the exponent a. 

Next, we recall that according to Newman and Park Q, 
high clustering coefficient and positive assortative mix- 
ing are two graph theoretical quantities typical of social 
networks. Therefore, we investigate whether these two 
structural properties of email graphs can distinguish the 
social imprint of legitimate email communication from 
the antisocial characteristics of spam. In order to do so 
we compare the average values of these network mea- 
sures determined for networks constructed from actual 
data with corresponding values obtained for networks 
with randomized links, with the same degree sequence. 

Indeed, considering the undirected versions of our net- 
works, the average clustering coefficient over the inter- 
nal social network is C — 0.241 ± 0.008, whereas the 
clustering coefficient in the internal antisocial network is 
much lower, at C = 0.052 ± 0.006. These results com- 
pare to the clustering coefficient of internal domain users 
of C — 0.154, found by Ebel et al. [l5[. Considering the 



networks that include external users, whose neighbors are 
only known incompletely, we find C — 0.137 ± 0.003 for 
the social network and C = 0.026 ± 0.001 for the antiso- 
cial network, in contrast with aC = 0.003 for the entire 
network of Ebel et al [l5| • Figure [2] shows the distribu- 
tion of the clustering coefficient for social, antisocial and 
their corresponding random networks. 

All four networks contain a significant fraction of their 
nodes with vanishing clustering coefficient, but this pro- 
portion is much higher for graphs that include external 
users and/or antisocial components. Specifically, 61% of 
all nodes in the entire social network have C = 0, while 
this becomes more than 81% for the entire antisocial com- 
ponent. The internal social network has only 25% of its 
nodes with C = 0, compared to 73% for the internal an- 
tisocial network. These features indicate that there are 
clear differences on average between clustering in a so- 
cial and an antisocial components of email networks, but 
also that low clustering is not a sufficient condition for a 
node to be associated with antisocial behavior. Similarly 
to the analysis of the degree distribution these results 
also indicate that the separation of the two traffics is im- 
portant in order to identify the truly social component. 
Failure to do so will result in the underestimation of the 
average social network transitivity. 

We now analyze the nature of degree correlations be- 
tween nodes by computing the corresponding Pearson 
correlation coefficient [TtJ r 

= >!./</■■, \i y.,.i. Eg fc *' (l) 

where j, and fcj are the excess in-degree and out-degree 
of the vertices that the ith edge leads into and out of, 
respectively, and M is the total number of edges in the 
graph. 

The expectation of assortative mixing by degree in a 
social network of email is not obvious. In fact as we ar- 
gued above, a user's degree is a very variable property, 
that can be easily changed drastically by the inclusion 
of the user's address in, or by the use of, distribution 
lists. This common use of email can create huge imbal- 
ances of degree between senders and recipients and may 
generate negative values for the Pearson coefficient even 
for groups of legitimate users. If this can be expected of 
the degree correlation in the social network, then such 
an effect should be even more pronounced in the antiso- 
cial graph. There, spam senders follow the strategy of 
increasing their degree indiscriminately and maximally, 
and consequently reach on average a population of re- 
cipients with much lower degree, which are statistically 
much more abundant for a scale free degree distribution. 

These qualitative expectations are borne out by esti- 
mation of r. Using ([I]) we computed the Pearson coef- 
ficient r for each of the four directed networks, and ob- 
tained r = —0.135 for the entire social network (with 
r = —0.082 for its corresponding randomized network), 
r = —0.139 (—0.111) for the entire antisocial network, 
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and r = 0.232 (0.095) and r = 0.049 (0.073) for the so- 
cial and antisocial internal networks, respectively. Stan- 
dard errors are smaller than 1%. Moreover, we observed 
that the positive value of r for the internal social net- 
work is the result of an approximately linear correlation 
between the out degree of the sender and the in degree of 
the recipient. Such systematic correlation across degree 
is absent for the other three networks, with the differ- 
ence that for networks containing external users there is 
an average imbalance between the degrees of senders and 
recipients that leads to a negative r. As we can see from 
r values, the social networks show significantly stronger 
assortativity (internal social network) and dissassortativ- 
ity (social network) than their corresponding randomized 
networks. On the other hand, there is a much less signifi- 
cant difference between the assortativity of real networks 
and their corresponding randomized versions in the an- 
tisocial case. 

We conjecture that the more negative Pearson coeffi- 
cient for the complete social network, which includes ex- 
ternal users, is the result of the widespread subscription 
to legitimate distribution lists, such as those related to 
news, promotions, etc [3(J We verified to the extent possi- 
ble, given that email user identifiers are made anonymous 
but domains are present, that external distribution lists 
are the main source of degree imbalance for the external 
social network. 

In summary, we see that the consideration of this set 
of standard network measures places networks of email 
communication in a unique position. On the one hand, 
the legitimate component of a completely known email 
network shares its transitivity and positive degree corre- 
lation properties with other social networks. Unlike some 
other social networks however its degree distribution is 
scale free and characterized by a small exponent, which 
implies that, although the distribution remains normal- 
izable, no finite moments exists as the network size goes 
to infinity (2 > a > 1). This property is a direct re- 
sult of the low cost of adding additional recipients to a 
message, and makes statistical estimation of degree cor- 
relations over email networks very sensitive and network 
size dependent, if not altogether ill defined. 

In spite of these properties, the antisocial network built 
from the exchange of spam messages, has definite proper- 
ties, showing negligible transitivity and assortative mix- 
ing near their corresponding random network with pre- 
served degree sequence. Moreover, our analysis shows 
that, in contrast to previous expectations 0|, social email 
networks involving users that are external to the local 
domain may present a negative degree correlation, pre- 
sumably reflecting in part the incomplete knowledge of 
external links, but also resulting from message exchanges 
characteristic of email, such as the widespread subscrip- 
tion to legitimate distribution lists. 

These differences suggest mechanisms to differentiate 
legitimate human collaboration from opportunistic be- 
havior on the basis of network structure, and have in- 
deed been proposed as the basis for spam detection al- 



gorithms fl9l. l20l| . However, much remains unsatisfactory 
about the transitivity and assortative mixing measures as 
means to characterize patterns of human communication. 
The most serious flaw is that their estimation relies on 
the knowledge of all neighbors of each node. This is not 
possible beyond a small subset, corresponding to users in 
the local domain; a general problem of the construction 
of any network. A solution to this problem is the con- 
sideration of quantities that characterize the dynamics of 
communication links between senders and recipients di- 
rectly, without reference to third parties. In other words, 
it is key to investigate whether the social and antisocial 
nature of a given node can be inferred from its dynamical 
behavior, even given incomplete knowledge of the social 
network of all its neighbors. 



III. TEMPORAL PATTERNS OF EMAIL 
COMMUNICATION 

We start with the simplest measure of communication 
between two users: reciprocity [26J]. We build a simple 
coefficient of preferential exchange Ei for user i as: 

F E jec< W -» - - J)] m 

where Cj is the set of all users that have contact with 
user i within a given time period, and k(j — ► i) is the 
number of messages sent by user j to i. Therefore, 
Q < Ei < 1, with the lower end corresponding to no 
message being replied to, and the upper end to every 
message obtaining a response. This can be further aver- 
aged over all users to generate network expectation values 
(E). Considering internal as well as external users, we 
find (E) = 0.0329 ±0.0005 in the social network, whereas 
a significantly lower (E) = 0.00007 ± 0.00002 is observed 
in the antisocial network. Values of (E) = 0.2757±0.0083 
and (E) = 0.0625 ±0.0056 are found in the internal social 
and antisocial networks, respectively. Therefore, antiso- 
cial networks are naturally associated with small (but po- 
tentially non-zero) reciprocity, whereas social networks, 
particularly those containing legitimate users whose be- 
havior we know completely, are associated with the high- 
est reciprocity. 

Up to this point we concentrated on the structure of 
the network of interactions mediated by email messages. 
In its construction as a graph we have not paid attention 
to the detailed temporal structure of message exchanges. 
An interesting question then is whether the dynamical 
properties of email traffic can distinguish different types 
of social relations. This question has recently become a 
subject of interest. Eckmann, Moses and Sergi [l4| have 
shown that coherent structures emerge from the tempo- 
ral correlations between time series expressing short peri- 
ods of intense message exchange between groups of users. 
Barabasi on the other hand, has shown that the dis- 
tribution of time intervals between email messages sent 
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Network Internal social Internal 
measure social antisocial 


Socicil 


Antisocial 


Degree distribution (a) 1.22 1.79 


1.82 


2.03 


Clustering coefficient (real/random) 0.2409/0.0188 0.0521/0.0103 


0.1374/0.0089 


0.0261/0.0124 


Assortative mixing (real/random) 0.2324/0.0946 0.0493/0.0727 


-0.1347/ - 0.0824 


-0.1387/ - 0.1110 


Preferential exchange ((E)) 0.27568 0.06246 


0.03288 


0.00007 



TABLE I: Summary results for structural measures applied to the Social (legitimate email) and Antisocial (spam exchange) 
total networks and to those restricted to internal traffic within the domain. 



by a single user may be well described by a power law dis- 
tribution P{t) ~ t~ 7 with 7 — 1, with bursts of activity 
alternating with long silences. 

Both these characterizations identify properties of le- 
gitimate email traffic - temporal correlations between 
users and inter-message time statistics - that are thought 
to be exclusively social and thus not shared by the anti- 
social traffic component. In fact intense email exchanges 
between small groups of users are to be expected in pat- 
terns of human communication, creatin g th e correlations 
observed by Eckmann, Moses and Sergi [14]. Barabasi in 
turn suggests that the power law statistics he observed 
can be explained in terms of a queueing model which 
encodes prioritization of tasks driven by human decision 
making. 

Although suggestive, these interesting results were ob- 
tained for selected senders and receivers of email. Con- 
sequently it remains unclear whether they hold for the 
general user or for aggregated groups of users. We have 
in fact attempted to verify Barabasi's findings in our log 
but obtained mixed results with some users showing the 
suggested power law behavior and others manifestly not, 
see Fig. [3] Similar results were reported in Ref. [25|] . 

To evade effects of variability associated with individ- 
ual users, we chose to investigated the statistics of our 
social and antisocial aggregate traffics through averaging 
over the behavior of all users in each class. The first 
obvious temporal property of email traffic is its non sta- 
tionarity, see Fig. [4] This feature creates difficulties for 
any attempt at statistical estimation. Social email traffic 
in particular shows large temporal variations, from night 
to day, working days to weekends, and for our data set, 
strong seasonality associated with the academic calen- 
dar. Antisocial traffic displays weaker non-stationarity, 
see Fig. |U 

The second temporal feature of email traffic is an im- 
mediate result of the power law degree distributions de- 
scribed above. The majority of users do not communicate 
often with many others, but have instead low degree as- 
sociated with an infrequent and often irregular usage of 
email. This means that the typical email user in our data 
- and, we believe, in most other large email networks - 
does not show time coherence with others, nor is he/she 
necessarily under the constraints of temporal optimiza- 
tion suggested by Barabasi. 

To circumvent some of these difficulties, we attempted 
to identify statistical temporal patterns of communica- 
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FIG. 3: The distribution of the intervals between sent mes- 
sages for two of the most active senders of legitimate email. 
While figure [3] (top) shows approximate power law statistics 
with power 7 = 0.55 (R 2 = 0.96), the distribution of fig- 
ure [3] (bottom), for a different user, is better described by a 
log-normal distribution. 



tion that are characteristic of the social vs. antisocial 
aggregated traffics. In so doing we average over the be- 
haviors of many users. Specifically, we represent tempo- 
ral patterns of message arrival through the definition of 
a state in terms of a communication word of size L. The 
dimension L is the number of time intervals, or letters, 
in the communication word, which is written as a vector 
W = ■ ■ ■ , ih\- The simplest representation of the 

traffic is through a binary assignment, where the value of 
ij is set to 1 if one or more messages were exchanged in 
the corresponding time interval, or ij = otherwise, i.e. 



W = [01001... 01] 



(3) 



where there are L boolean variables, each corresponding 
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FIG. 4: Temporal power spectrum of the legitimate (top) and 
spam (bottom) traffics aggregated over all users in each class. 
Daily and weekly periods are features of both traffics, but 
dominate legitimate email exchange. The power spectrum of 
spam traffic is more uniform at short times. 



to the exchange, or not, of a message in consecutive time 
periods At. For stationary processes the probability of 
a message exchange occurs with a fixed probability per 
unit time. The representation of time series in terms of 
binary words is familiar from other contexts in physics 
and information theory [12, from the analysis of the 
time evolution of dynamical systems, to trains of action 
potential in neuronal activity [24( or bit streams in noisy 
communication channels. The entropy of the distribution 
and its variation with the word size L give us in fact 
some of the essential properties of the dynamical rules 
that generate these dynamical patterns [22 . l23| . 

To illustrate these statements consider the simplest 
statistical model that generates a binary time series sub- 
ject to a given message arrival rate p. Then p can be 
written as the probability to obtain a 1 at each letter. 
If we further assume that bits corresponding to different 
letters are uncorrelated then the bit value at each letter 
can be regarded as the result of an independent Bernoulli 
trial. 

Under these assumptions the probability of a given 
number of events k in L trials (bins) is well known to 
be given by the binomial distribution 



/(fc;L,p)=(k) P k (1-P) 



\L-k 



(4) 



number of events is the same regardless of their order, as 
each occurrence is independent for different bins. Thus 
to obtain the probability for a particular sequence of k 
events in L bins we must divide by the number of possible 
arrangements (¥\ . Then the probability for a particular 
sequence or binary word with k ones and length L is 



Pw 



p k (1-p) 



L-k 



E£= (£) p k ' (i-p) £ - 



p k (l-p) L - k .(5) 



Because all words with a given number n of Is are equally 
likely, their probability is pw{n;L,p) = p n (1 — p) L ~ n . 
This implies that the Shannon entropy of the time series 
can be written as 



H = -^Pwlog 2 (pH/) = -<fc)log 2 



w 
m L, 



1-p 



Llog 2 (l-p) 
(6) 



Moreover the probability of a sequence with the same 



with (k) = Lp. Thus, in the absence of temporal corre- 
lations, the Shannon entropy is a strict linearly grow- 
ing function of the word length L, with slope m = 
-(1-p) log 2 (l -p) -p\og 2 p> 0. 

These expressions become especially simple if the tem- 
poral bin for each letter is chosen such that p = 1/2, in 
which case to = 1 is maximal. This independent mes- 
sage model (IMM) is the maximal entropy distribution 
for a traffic characterized by an average message arrival 
probability p. Real traffics, which show temporal struc- 
ture, must therefore display lower entropy relative to the 
idealized IMM message stream. We refer to the differ- 
ence of the traffic entropy to that of the corresponding 
Hi mm (L) , measured with the same average choice of p, 
as the traffic's structural information, for a given L. 

Figure [5] shows the difference between the entropy of 
the independent message model and the real traffics, le- 
gitimate and spam. We aggregated the data into two 
temporal periods: work hours (i. e. the period from 
8AM to 8PM of the weekdays, except holidays, in the 
log) and remaining times which we refer to as non-work 
hours. 

The results show that the social email traffic has lower 
entropy (higher structural information) than the antiso- 
cial traffic for both work and non-work periods. This 
difference becomes more noticeable the larger the word, 
thus capturing longer patterns of communication and the 
presence of time correlations. The difference between the 
independent message model, where for p = 1/2 all words 
are equally likely, and the real traffics is that in the latter 
words with many Is (0s) are suppressed while the prob- 
ability of words with two to three Is separated by one 
to three Os is enhanced. The difference between social 
and antisocial traffics is more subtle, with social email 
traffic displaying a greater probability for words with an 
isolated message in a long stream of silence. These struc- 
tures are reminiscent of those found by Barabasi [2l[ , but 
display less definitive statistical signatures. Nevertheless, 
we see that both social and antisocial traffics are far from 
random, and that social email shows stronger temporal 
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FIG. 5: The variation of the difference between the indepen- 
dent message model entropy Himm{L) and the entropy of the 
legitimate and spam traffics H(L), with word size L, during 
work (top) and non-work (bottom) periods. All word proba- 
bility distributions were constructed by normalizing the time 
bin for each letter word so that p = 1/2. As a result the 
time bin for each letter of the social traffic during work hours 
was set to 4s, and lis for the corresponding non-work period. 
Time bins for the antisocial traffic were set at 4s during work 
hours and 5s otherwise. The slight excess curvature for large 
L is the result of poorer estimation of rare long words. 
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